Skip to content

feat(examples): add custom HTTP embedding example for LM Studio / Ollama#149

Open
cluster2600 wants to merge 7 commits intoalibaba:mainfrom
cluster2600:feat/lmstudio-custom-http-embedding
Open

feat(examples): add custom HTTP embedding example for LM Studio / Ollama#149
cluster2600 wants to merge 7 commits intoalibaba:mainfrom
cluster2600:feat/lmstudio-custom-http-embedding

Conversation

@cluster2600
Copy link
Contributor

Summary

This PR adds a self-contained example showing how to use any OpenAI-compatible HTTP embedding endpoint (LM Studio, Ollama, vLLM, LocalAI, …) as the embedding source in zvec.

What's added

examples/custom_http_embedding.py

HTTPEmbeddingFunction A zero-dependency class (stdlib only) that calls any /v1/embeddings endpoint, caches results with @lru_cache, and satisfies the DenseEmbeddingFunction protocol.
Collection setup HNSW index with cosine similarity, dimension auto-detected from the server response.
Insert + query 5 sample documents embedded on the fly, then a semantic search query.
CLI interface --base-url, --model, --api-key, --collection-path flags for easy customisation.
README-style header Step-by-step instructions for LM Studio and Ollama embedded at the top of the file.

Usage

# LM Studio (default)
python examples/custom_http_embedding.py

# Ollama
python examples/custom_http_embedding.py \
    --base-url http://localhost:11434 \
    --model nomic-embed-text

Motivation

The existing extensions (OpenAIDenseEmbedding, etc.) require the openai package and are primarily designed for cloud APIs. Many developers want to use local inference servers without extra dependencies. This example shows the pattern using only Python stdlib, making it easy to adapt or inline.

Testing

The example runs end-to-end against a live LM Studio instance on localhost:1234. No new test infrastructure is required for a standalone script.

@CLAassistant
Copy link

CLAassistant commented Feb 19, 2026

CLA assistant check
All committers have signed the CLA.

@Cuiyus
Copy link
Collaborator

Cuiyus commented Feb 26, 2026

Thank you for your submission!

This service-oriented model for invocation, which helps zvec achieve the RAG capability, is what we currently lack.
You can implement your HttpEmbedingFunction(OllmaEmbedingFunction) by inheriting DenseEmbedingFunction in the directory python/zvec/extension. This will make it easier for more users to use!

https://zvec.org/api-reference/python/extension/#zvec.extension.DenseEmbeddingFunction

@cluster2600
Copy link
Contributor Author

Thanks for the feedback! Moved the implementation into python/zvec/extension/http_embedding_function.py as HTTPDenseEmbedding, inheriting from DenseEmbeddingFunction. It's now exported from zvec.extension and the example imports it from there instead of defining the class inline.

Move the HTTP embedding implementation from the example script into
python/zvec/extension/ as HTTPDenseEmbedding, inheriting from
DenseEmbeddingFunction. The example now imports from zvec.extension
instead of defining the class inline.

Signed-off-by: Maxime <maxime@cluster2600.com>
Signed-off-by: Maxime Grenu <maxime.grenu@gmail.com>
@cluster2600 cluster2600 force-pushed the feat/lmstudio-custom-http-embedding branch from 31f67ed to 9a81b28 Compare February 26, 2026 09:11
Move zvec imports to top-level, add noqa for print statements,
replace os.path.exists with pathlib, fix import sorting.

Signed-off-by: Maxime <maxime@cluster2600.com>
Signed-off-by: Maxime Grenu <maxime.grenu@gmail.com>
Signed-off-by: Maxime <maxime@cluster2600.com>
Signed-off-by: Maxime Grenu <maxime.grenu@gmail.com>
The vector_column_indexer_test failure is a known flaky assertion in
hnsw_streamer_entity.h, unrelated to Python-only changes in this PR.

Signed-off-by: Maxime <maxime@cluster2600.com>
Signed-off-by: Maxime Grenu <maxime.grenu@gmail.com>
@Cuiyus
Copy link
Collaborator

Cuiyus commented Feb 27, 2026

@greptile

@greptile-apps
Copy link

greptile-apps bot commented Feb 27, 2026

Greptile Summary

This PR adds HTTPDenseEmbedding, a stdlib-only implementation for OpenAI-compatible embedding endpoints (LM Studio, Ollama, vLLM, LocalAI). The implementation follows established project patterns including @lru_cache for result caching and the DenseEmbeddingFunction protocol.

Key additions:

  • HTTPDenseEmbedding class with automatic dimension detection
  • Comprehensive example with CLI interface for easy testing
  • Zero external dependencies (uses only urllib.request and stdlib)
  • Proper error handling for network failures and malformed responses
  • Consistent with existing embedding function implementations

The code is production-ready, well-documented, and provides valuable functionality for users running local inference servers.

Confidence Score: 5/5

  • This PR is safe to merge with no identified issues
  • The code follows established project patterns, includes comprehensive documentation and error handling, and has been tested against live servers. No breaking changes or security concerns identified.
  • No files require special attention

Important Files Changed

Filename Overview
python/zvec/extension/http_embedding_function.py Added new HTTPDenseEmbedding class implementing stdlib-only OpenAI-compatible embedding endpoint support with automatic dimension detection and LRU caching
examples/custom_http_embedding.py Added comprehensive example demonstrating HTTPDenseEmbedding usage with LM Studio/Ollama, including CLI interface and end-to-end workflow
python/zvec/extension/init.py Added export for new HTTPDenseEmbedding class to extension module's public API

Sequence Diagram

sequenceDiagram
    participant User
    participant HTTPDenseEmbedding
    participant Cache
    participant Server as Local Server<br/>(LM Studio/Ollama)
    
    User->>HTTPDenseEmbedding: __init__(base_url, model)
    HTTPDenseEmbedding->>HTTPDenseEmbedding: Store config
    
    User->>HTTPDenseEmbedding: dimension property
    HTTPDenseEmbedding->>HTTPDenseEmbedding: embed("dimension probe")
    HTTPDenseEmbedding->>Server: POST /v1/embeddings
    Server-->>HTTPDenseEmbedding: {data: [{embedding: [...]}]}
    HTTPDenseEmbedding->>Cache: Store result
    HTTPDenseEmbedding-->>User: vector dimension
    
    User->>HTTPDenseEmbedding: embed("user text")
    HTTPDenseEmbedding->>Cache: Check cache
    alt Cache hit
        Cache-->>HTTPDenseEmbedding: Cached vector
    else Cache miss
        HTTPDenseEmbedding->>Server: POST /v1/embeddings
        Server-->>HTTPDenseEmbedding: {data: [{embedding: [...]}]}
        HTTPDenseEmbedding->>Cache: Store result
    end
    HTTPDenseEmbedding-->>User: vector
Loading

Last reviewed commit: eb3960e

Per maintainer feedback, examples requiring an external LLM server
belong in the zvec-web project rather than in this repository.

Signed-off-by: Maxime Grenu <maxime.grenu@gmail.com>
@cluster2600
Copy link
Contributor Author

Removed the example file as requested — server-dependent examples belong in zvec-web.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants